home *** CD-ROM | disk | FTP | other *** search
- Performance issues
- ==================
-
- The StrongARM has significantly different performance characteristics to
- older ARM processors. It is clocked 5 times faster than any previous ARM,
- and many instructions execute in fewer cycles. In particular
-
- * B/BL take 2 cycles, rather than 3
- * MOV PC,Rn and ADD PC,PC,Rn,LSL #2 etc take 2 cycles rather than 3
- * LDR takes 2 cycles (from the cache) rather than 3, and will take
- only 1 cycle if the result is not used in the next instruction.
- * STR takes 1 cycle rather than 2, if the write buffer isn't full
- * MUL/MLA take 1-3 cycles rather than 2-17 cycles.
- * Many instructions will in fact take only one cycle provided the
- result is not used in the next instruction.
-
- For fuller information see the StrongARM Technical Reference Manual,
- available from Digital Semiconductor's WWW site (currently at
- http://www.digital.com/info/semiconductor/dsc-strongarm.html)
-
- The StrongARM's cache and write buffer are also significantly better than
- previous ARMs, allowing an average fivefold speed increase, despite the
- unaltered system bus. Pumping large amounts of data will still be limited
- by the system bus, but advantage can be taken of the write buffer to
- interleave a large amount of processing with memory accesses. For example
- on StrongARM it is quicker to plot a 4bpp sprite to a 32bpp mode than to
- plot a 32bpp sprite to a 32bpp mode; the latter case is pure data transfer,
- while the former is less data transfer with interleaved (ie effectively
- free) processing.
-
- The long cache lines of the ARM710 and StrongARM can impact performance.
- A random read or instruction fetch from a cached area will load 8 words
- into the cache; this can make traversal of a long linked list inefficient.
- It is also often worth aligning code to an 8-word boundary. In current
- versions of RISC OS modules are loaded at an address 16*n+4. Future
- versions of RISC OS will probably load modules at an address 32*n+4, so it
- is worth aligning your service call entries appropriately in preparation
- for this change.
-
- Two significant disadvantages of StrongARM over previous processors
- are:
-
- 1) Burst reads are not performed from uncached areas. In particular
- this means that reads from the screen are slower on the StrongARM
- than on previous ARMs. A future version of RISC OS may address this
- by marking the screen cacheable before reading (eg in a block copy
- operation). Also, burst writes are not performed to unbuffered
- areas.
-
- 2) Code modification is expensive. You can modify code, but a
- complete SynchroniseCodeAreas can take of the order of half
- a millisecond (ie 100000 processor cycles) to execute, and will
- flush the entire instruction cache. Thus use of self-modifying
- code is strongly deprecated; a static alternative will almost
- always be faster. Synchronisation of a single word (eg modifying
- a hardware vector) is cheaper (of the order of 100 processor
- cycles) but still requires the whole instruction cache to be
- flushed.
-
- Note that future processors will no doubt have different performance
- characteristics again; you shouldn't optimise your code too much for one
- particular architecture at the expense of others. However, hopefully you
- will now have a better idea how to get better performance from your
- StrongARM.
-